Enterprise Storage Management: A Blueprint for the Future By: Michael Wehrs Director of Business Development Conner Software, Lake Mary, Florida Enterprise Storage Management: A Blueprint for the Future Data storage technology has been one of the driving forces in the proliferation in personal computer technology. In the past ten years we have seen PCs make the move from the desktop to the backbone of enterprise wide computing solutions . With that change came a shift in the way data is stored and managed. In the early, first generation of personal computers when data storage was still personal -- all storage resided in the system on the desktop. Data management consisted largely of periodically backing up critical data files to floppy disk or some low cost tape drive. The second generation of network computing systems extended data storage throughout the enterprise. Data resided on the local desktop and also on the network server. The concept of data management was extended to include full featured backup software that allowed network managers to automate the backup process during off hours when the network load was minimal. The process became easier, but conceptually the data management function was the same -- make a duplicate copy of the hard disk files in case of a failure. We are now on the verge of the third wave in the personal computing revolution, one that will bring significant change to the way users work and the way information is stored and managed. The concept of "backup" will be eliminated as we know it. Instead, the requirement will be for a new generation of intelligent software that will automatically and transparently move data to the most appropriate storage medium based on cost and access requirements. The framework of this new era is provided by Microsoft Corporation's concept of information at your fingertips. In the data storage world, it will become "information where it belongs, when you need it." Simply put, these software solutions will address the growing user and network administrator storage requirements of networks by automating protection, storage and retrieval of data in an efficient and cost effective manner. The cost of managing information is often overlooked when configuring storage systems. The cost of the hardware is easily identifiable. But as the amount and the complexity of storing data increases, the costs associated with managing that information are also skyrocketing. It is estimated that by 1995, 80 percent of all business PCs will be connected to LANs -- double what it is today. And, the storage capacity of LANs is doubling annually. According to a Dataquest report, it takes an average of 1200 hours per year to administer a single network server. Obviously, without dramatic improvement in the administrator's ability to manage information, it will soon become cost prohibitive. The objective then is to create a storage environment that provides LAN administrators with a robust set of tools that will simplify and automate their job. Storage management encompasses a full range of services, providing capabilities far beyond simple backup and restore. Indeed, backup is not the issue any longer. The fundamental objective of this technology is to develop the most effective method of storing, protecting and retrieving critical information. This new storage management model dictates the development of software and hardware components to address the need for automated data storage and retrieval. The goal is to migrate data to the most cost effective storage medium that meets the access speed requirements of the user. That information will be automatically re-located onto the most appropriate storage medium through a variety of processes and storage services transparent to the user. The model includes a hardware storage server, and the software component that Conner Software has designated "Storage Manager." To the user, the hardware storage server can best be understood not as a single device, but as a single virtual device, composed of multiple storage devices physically located anywhere throughout the enterprise, indeed anywhere in the world if required. The location is irrelevant. It is also irrelevant to the user on which medium the information is stored and the physical location of that device. In conjunction with the intelligent Storage Manager application, when a save command is issued, the storage system will automatically write the file to the appropriate medium based on a set of pre-defined guidelines. The information will then be saved to hard disk storage, some type of near-line medium such as rewritable optical disk, or off-line storage on an automated tape library. While this hierarchical structure to the storage server hardware is a critical piece of the solution, it is the advanced software technology that is the key enabling technology required to make this vision a reality. Storage Manager will deliver a variety of services to LAN administrators and create a storage and retrieval hierarchy that maximizes utilization of the available hardware devices. There are three critical objectives in developing truly useful storage management software: 1. provide the administrator with a robust set of automated management tools; 2. provide the user with transparent access to any information in the enterprise, regardless of its physical storage medium; 3. and, create a hierarchical storage environment that maximizes existing hardware investments by utilizing a software environment based on intelligent storage software and "storage aware" operating systems. Management Tools Improving storage management is based on two cornerstone objectives -- giving the administrators enough tools, and to then automate the process to the highest level possible in order to reduce the administrative burden. Ideally, it will evolve to a self-administering storage system, where the parameters regarding backup, file migration and disk grooming would be set when the system is installed. After that, the storage management software would do the rest, including analyzing problems and choosing the best alternative to address those problems. For example, in order to minimize administrative overhead, network processes such as backup -- which includes servers and remote workstations -- will be centrally administered. In fact, we will soon be at the point where backup will be eliminated as an application altogether. Rather, it will be integrated into the Storage Manager functionality by continuously monitoring file usage and proactively, transparently and intelligently, ensuring a copy of the data is stored on a medium and in a place to provide for disaster protection. Administrators will also need access to tools that monitor, analyze and adjust network storage operations. This component will continuously monitor and analyze network storage performance and communicate this information to the storage server. This network "agent" identifies and notifies the administrator of bottlenecks and the status of critical components that affect system integrity and performance, as well as projecting additional system requirements and solutions to maintain high-performance storage objectives. User Transparency In most cases today when a user requests a file, he must know which server or subdirectory the file is stored on. As storage systems become more complex, this process becomes a cumbersome and unnecessary burden for users. User transparency is a fundamental piece of future storage models. Features such as data cataloging will assist users in searching for specific files by supporting search criteria beyond filenames, creation or modification dates. Data cataloging provides this capability by enabling highly customized and intelligent categorization of data. These advanced features will be enabled by more capable operating systems. Applications such as Storage Manager, in turn, will empower users, allowing them to take advantage of sophisticated storage functions without changing the way they work. From the user perspective, they will continue to save and open documents just as they always have. The changes in the way the information is stored and where it is stored will be isolated from the user by Storage Manager. The user does not know, and has no need to know, which physical device the information is stored on. It is really irrelevant to the user. Different pieces of the same file may be scattered on multiple physical storage devices across the network. The user will simply request a document. Storage Manager will take that request, look up all linkages and references to determine where all of the required pieces of that document are stored and retrieve it for the user as a single document. The user is spared from having to specify which drive, subdirectory or server the document is stored on. In fact, the user won't know where it is. The document could be stored across the hall or across the county; it doesn't matter. Cost Effective Storage Effective data management is dependent on a hierarchical storage system that has the ability to easily migrate inactive files off the hard disk to removable secondary storage; in effect, creating unlimited storage capacity. This design is analogous to moving paper files off your desk to a file cabinet when they aren't needed for the current project you are working on. Only active information is kept on the desktop. When that project is concluded, those folders are relocated to the file cabinet until they are needed again. Files are not moved off the desktop as a backup copy. They are simply moved because it is more efficient and productive to keep only the current information on the desktop. Otherwise, the work environments become too cluttered and difficult to manage. The same concepts apply to electronic data storage. It is far too costly to keep all network data on hard disk storage. Files that are not regularly accessed can be migrated off the disk to secondary storage, a much more cost effective solution. Even though the cost of hard disk storage has fallen dramatically in recent years, it still cannot compete with magnetic tape where several gigabytes of data can be stored on a single tape cassette that costs less than $20. However, one must again factor in the management and administrative cost of such a storage architecture. If the network managers have to continually examine the disk and manually select and migrate the files, the effectiveness of this solution is negated. The requirement then is for some type of intelligent software agent that monitors the file activity on the disks, and based on the parameters established by the LAN administrator, removes files to secondary storage as required. File versioning is another critical element of an unlimited hierarchical storage system. With this capability, a storage system can be established so that users do not overwrite previous versions of a file; instead, every time the file is saved, a new version is created. If part of a file -- or indeed the entire file -- is inadvertently eliminated, the user still has access to a previously saved copy of the data. File versioning also provides a method of tracking progress of projects and establishing an audit trail of work that is performed. The requirements for automated storage management, user transparency and cost efficient storage hierarchies will be more critical as the next generation of operating systems and applications emerge. The current trends are clearly in the direction of object-oriented operating systems and file systems. The concept of a data file belonging to a specific application will be outmoded. This scenario offers the possibility of compound documents -- or objects -- consisting of text generated by a word processing module, pictures generated by a graphics module and numbers produced by a spreadsheet module. The combination of these elements is treated as an object, not as an application-based file. It can also lead to a storage scenario where the text components are stored on a local disk, while the graphics elements are kept on a different, remote server. In order to manage storage and migration of files of this complexity, future operating systems must implement a more sophisticated, more intelligent file system architecture. The operating system must be aware of the multiple storage devices, including secondary removable storage, that are in use throughout the enterprise. Additionally the operating system must offer robust interfaces to software such as Storage Manager that will extend the capabilities inherent in the operating system by automating the administration of storage operations. We have already started to see small signs of the future in some current products. Creating compound documents is now possible in both the Windows and Macintosh environments. A word processing document can now include spreadsheet information and graphics data. Using Publish-and-Subscribe in the Macintosh environment and OLE under Windows, this information is dynamically linked. If the resulting compound document is offloaded to secondary storage and one of the original source documents is updated, the operating system must have the intelligence to know first of all that a file on a tape may need to be updated, and secondarily, which tape specifically contains that information. Resources such as Storage Manager cannot function unless they are enabled at the most basic level with operating systems that are aware of these considerations. Applications such as Storage Manager would simply use those capabilities to do advanced storage functions, including version control, hierarchical storage, data migration, store and forward. Taking this example a step further, imagine creating a blank new document outside of any specific application. Some spreadsheet numbers are added, as is some text using the word processing tools. Then the document is saved. Is it a spreadsheet file or a word processing file? Actually, it is nothing but a group of pointers referencing the spreadsheet and word processing files that were created. The filename given by the user to his creation doesn't contain any of the data that was input; it simply contains those pointers and references. The compound document itself references what the specific name is. All of this, of course, is transparent to the user, who only sees the filename given to the compound document, without any awareness of these distributed pieces of the file. When the file is called up, by whatever name you called it, you only see the compound document. The issue is what happens and what should happen when the compound document is backed up. If the compound filename is backed up, the file on the tape will only have the pointers; nothing was done to secure the actual data. That's why Storage Manager has to be enabled at the operating system level. So if someone makes a call to a backup application to back up a specific file, the operating system in combination with the Storage Manager, has the awareness to step in and back up not just the file containing the pointer, but the constituent pieces of the word processing and spreadsheet information as well. The storage and management of information is, of necessity, on the verge of a radical change. Traditional methods of storage, protection and retrieval of information have changed little even though the amount and complexity of information storage have grown tremendously. Very simply, the old methods are inadequate. Developing advanced storage hardware and software architectures that harness the available computing power to manage and automate the process is the only solution to this information storage dilemma. Advances in storage hardware have outpaced evolution of storage software solutions. However, software technology is about to take a major leap ahead. The combination of the next-generation of advanced operating systems and software applications specifically designed to meet the challenge of automating the management of vast amounts of network data will provide the foundation for future extensions that will provide administrators and users alike with "information where it belongs, when you need it." * * * * * *